语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.
translated by 谷歌翻译
It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network. In this work, we derive the exact equivalence between infinitely wide deep networks and GPs. We further develop a computationally efficient pipeline to compute the covariance function for these GPs. We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR-10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We further find that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite-width networks. Finally we connect the performance of these GPs to the recent theory of signal propagation in random neural networks. * Both authors contributed equally to this work. † Work done as a member of the Google AI Residency program (g.co/airesidency). 1 Throughout this paper, we assume the conditions on the parameter distributions and nonlinearities are such that the Central Limit Theorem will hold; for instance, that the weight variance is scaled inversely proportional to the layer width.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Recent research has demonstrated the capability of behavior signals captured by smartphones and wearables for longitudinal behavior modeling. However, there is a lack of a comprehensive public dataset that serves as an open testbed for fair comparison among algorithms. Moreover, prior studies mainly evaluate algorithms using data from a single population within a short period, without measuring the cross-dataset generalizability of these algorithms. We present the first multi-year passive sensing datasets, containing over 700 user-years and 497 unique users' data collected from mobile and wearable sensors, together with a wide range of well-being metrics. Our datasets can support multiple cross-dataset evaluations of behavior modeling algorithms' generalizability across different users and years. As a starting point, we provide the benchmark results of 18 algorithms on the task of depression detection. Our results indicate that both prior depression detection algorithms and domain generalization techniques show potential but need further research to achieve adequate cross-dataset generalizability. We envision our multi-year datasets can support the ML community in developing generalizable longitudinal behavior modeling algorithms.
translated by 谷歌翻译
我们为移动设备提出了一个轻巧的单图超分辨率网络,名为XCAT。XCAT引入了具有交叉串联(HXBLOCK)的异质群卷积块。输入通道向组卷积块的异质拆分减少了操作数量,交叉串联允许在级联HXBlocks的中间输入张量之间进行信息流。HXBlocks内部的交叉串联也可以避免使用更昂贵的操作,例如1x1卷积。为了进一步预见昂贵的张量副本操作,XCAT利用不可训练的卷积内核来应用采样操作。XCAT考虑了整数量化的设计,还利用了几种技术,例如基于强度的数据增强。Integer的XCAT量化XCAT可在320ms的Mali-G71 MP2 GPU上实时运行,以及适用于实时应用的30ms(NCHW)和8.8ms(NHWC)的Synaptics Dolphin NPU。
translated by 谷歌翻译
随着机器学习和深度学习模型在多种领域变得非常普遍,因此采用决策过程的主要保留是它们的黑盒本质。可解释的人工智能(XAI)范式由于其能够降低模型不透明度的能力而获得了很多动力。 XAI方法不仅增加了利益相关者对决策过程的信任,而且还帮助开发商确保了其公平性。最近的努力用于创建透明的模型和事后解释。但是,对于时间序列数据,开发了更少的方法,而在多元数据集方面甚至更少。在这项工作中,我们利用塑形组的固有解释性来开发模型不可知的多元时间序列(MTS)反事实解释算法。反事实可能会通过指示在输入上必须执行哪些更改以改变最终决定,从而对制作黑框模型产生巨大影响。我们在现实生活中的太阳耀斑预测数据集上测试了我们的方法,并证明我们的方法会产生高质量的反事实。此外,与唯一的MTS反事实生成算法的比较表明,除了视觉上可以解释外,我们的解释在接近性,稀疏性和合理性方面也很出色。
translated by 谷歌翻译
基于变压器的大语言模型(LLM)的最新进展已导致许多任务的性能改进。这些收益随着模型的大小而大幅增加,可能导致推理时间缓慢且昂贵的使用。但是,实际上,LLMS制造的一代人由不同的难度组成。尽管某些预测确实从模型的全部容量中受益,但其他延续更为微不足道,可以通过减少的计算来解决。在这项工作中,我们介绍了自信的自适应语言建模(平静),该框架用于动态分配每个输入和生成时间段的不同计算。提前退出解码涉及我们在这里解决的几个挑战,例如:(1)使用什么信心措施; (2)将序列级别的约束连接到局部人口退出决策; (3)由于以前的令牌中的早期退出而返回丢失的隐藏表示形式。通过对三个不同文本生成任务的理论分析和经验实验,我们证明了框架在减少计算的效果 - 潜在的速度最高为$ \ times 3 $ - 同时可维持高性能。
translated by 谷歌翻译
具有常识性推理(CSR)能力的编程机器是人工智能界的长期挑战。当前的CSR基准测试使用多项选择(在相对较少的情况下,生成的)提问实例来评估机器常识。基于变压器的语言表示模型的最新进展表明,现有基准取得了很大进展。然而,尽管目前存在数十个CSR基准,并且正在增长,但尚不清楚全面的常识能力套件已被系统地评估。此外,人们对语言模型是否“适合”基准数据集的培训分区存在疑问,因为它可以通过微妙但无关紧要(至少对于CSR而言),这是在测试分区上实现良好性能的统计功能。为了应对这些挑战,我们提出了一个名为理论上的常识性推理(TG-CSR)的基准,该基准也基于歧视性问题的回答,但旨在评估常见方面的各个方面,例如时空,时间和世界国家。 TG-CSR基于Gordon和Hobbs首先提出的常识性类别的子集。基准还设计为几乎没有射击(并且将来,零射),只提供了少数培训和验证示例。该报告讨论了基准的结构和构建。初步结果表明,即使对于为歧视性企业社会责任的问题回答任务而设计的高级语言表示模型,基准也是挑战性的。基准访问和排行榜:https://codalab.lisn.upsaclay.fr/competitions/3080基准网站:https://usc-isi-i2.github.io/tgcsr/
translated by 谷歌翻译
从空中和卫星图像提取自动化路线图是一个长期存在的挑战。现有算法基于像素级分段,然后是矢量化,或者使用下一个移动预测的迭代图构造。这两种策略都遭受了严重的缺点,特别是高计算资源和不完整的产出。相比之下,我们提出了一种直接在单次通过中缩小最终道路图的方法。关键思想包括组合完全卷积的网络,这些网络负责定位点,例如交叉点,死头和转弯,以及预测这些点之间的链路的图形神经网络。这种策略比迭代方法更有效,并允许我们通过在保持训练端到端的同时消除生成起始位置的需要来简化培训过程。我们评估我们对流行的道路流数据集上现有工作的方法,并实现竞争结果。我们还将速度基准测试,并表明它优于现有的方法。这为嵌入式设备打开了飞行中的可能性。
translated by 谷歌翻译